Skip to content

Conversation

@ffranr
Copy link
Contributor

@ffranr ffranr commented Dec 2, 2025

Closes #1181


  • Treat integrated sub-servers as fatal to startup and return an error if any fail to start.
  • Propagate integrated sub-server startup errors to LiT so it stops launching and records the failure status.
  • Update docs to reflect integrated sub-servers are now critical to startup.

@ffranr ffranr self-assigned this Dec 2, 2025
@gemini-code-assist
Copy link

Summary of Changes

Hello @ffranr, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the reliability of the LiT startup sequence by elevating the status of integrated sub-servers to critical. Previously, LiT could continue operating even if some of its integrated sub-servers failed to launch. Now, any failure in an integrated sub-server's boot process will cause LiT itself to cease its startup, ensuring that the system only reaches an operational state when all essential integrated components are functioning correctly. This change provides a more robust and predictable initialization experience.

Highlights

  • Critical Integrated Sub-Server Startup: Integrated sub-servers are now treated as critical components for LiT (Lightning Terminal) startup. If any integrated sub-server fails to initialize, LiT will now abort its launch process.
  • Error Propagation: The StartIntegratedServers function in the sub-server manager has been updated to return an error if any integrated sub-server fails to start, allowing LiT to properly handle and propagate these failures.
  • LiT Startup Robustness: LiT's main startup routine (terminal.go) now explicitly checks for errors returned by StartIntegratedServers and will set an errored status and halt its own startup if such failures occur.
  • Documentation Update: Comments within the subservers/interface.go file have been updated to reflect the new critical nature of integrated sub-servers during LiT startup.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the startup logic to treat integrated sub-server startup failures as fatal. The changes correctly propagate errors from StartIntegratedServers up to the main start function, causing LiT to fail on startup as intended. The documentation has also been updated to reflect this new behavior. The implementation is sound, but I have one suggestion to improve the consistency of error handling.

Copy link
Contributor

@ViktorT-11 ViktorT-11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @ffranr 🙏!

In addition to the feedback I've commented below, this new behaviour definitely needs itest coverage.

I think we actually started working on this at the same time, and I have local branch with draft code implementing this + itest coverage. If you want to, i can clean that up and push it so that you can cherry-pick that to make it more simple for you. Let me know if that'd be helpful :).

@lightninglabs-deploy
Copy link

@ffranr, remember to re-request review from reviewers when ready

Copy link
Member

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

havent actually looked at the diff here yet, just want to make a note in case: we should just make sure that the daemon still runs & ie, that the status server still gets served

@ffranr ffranr force-pushed the wip/fail-startup-on-subserver-err branch 4 times, most recently from 1772099 to 0091958 Compare December 16, 2025 17:11
* Treat integrated sub-servers as fatal to startup and return an error
if any fail to start.
* Propagate integrated sub-server startup errors to LiT so it stops
launching and records the failure status.
* Update docs to reflect integrated sub-servers are now critical to
startup.
- Ensure critical integrated sub-servers initialize first.
- Introduce alphabetical sorting for consistent order across startup
  runs.
- Introduce tests for critical and non-critical sub-server startup
  behavior.
- Ensure failures in critical servers stop startup, while non-critical
  failures are tolerated.
@ffranr ffranr force-pushed the wip/fail-startup-on-subserver-err branch from 0091958 to 381d0b1 Compare December 18, 2025 00:17
Comment on lines +137 to +140
if criticalIntegratedSubServers.Contains(ss.Name()) {
return fmt.Errorf("%s: %v", ss.Name(), err)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but i think this then will result in complete shutdown which i dont think we want? we want the status server to remain running.

Copy link
Contributor

@ViktorT-11 ViktorT-11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates 🚀! Leaving some additional feedback in addition to @ellemouton's feedback.


// testCriticalTapStartupFailure ensures LiT exits quickly when a critical
// integrated sub-server (tapd) fails to start during boot.
func testCriticalTapStartupFailure(ctx context.Context, net *NetworkHarness,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this 🙏! Similar to the unit tests, we should also have an itest which covers a startup when a non-critical sub-server errors during startup.

if err != nil {
s.statusServer.SetErrored(ss.Name(), err.Error())

if criticalIntegratedSubServers.Contains(ss.Name()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 🙏!

Note that the commit message is outdated and does not cover the "criticalIntegratedSubServers" part.

Comment on lines +412 to +425
if client, err := g.basicLNDClient(); err == nil {
stopCtx, cancel := context.WithTimeout(
ctx, 5*time.Second,
)
defer cancel()

_, err := client.StopDaemon(
stopCtx, &lnrpc.StopRequest{},
)
if err != nil {
log.Warnf("Error stopping lnd after failed "+
"start: %v", err)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will now attempt to shutdown lnd for all types of startup errors, not just the "critical sub-server" error.

I'm not sure that this is exactly what we want, for 2 main reasons:

  1. This will apply in remote mode as well, and attempt to shutdown the remote lnd node for any litd related startup error.
  2. Even in integrated mode, it's not certain that we've gotten to executing the code that sets basicLNDClient when g.start errors, despite having started lnd. Meaning that lnd will not be shutdown in that scenario. That leads to a quite unpredictable behaviour, where lnd will sometimes be shutdown when g.start errors, and sometimes not.

I therefore think we should only execute this if we specifically error during the startup of a critical litd sub-server, to keep the behaviour predictable, and not effect remote lnd nodes. You may want to guard the shutdown request and require that litd is not running with a "remote" lnd node, even though the current critical sub-server startup error logic cannot occur when lnd is in remote mode.

err)
}

return startErr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to what @ellemouton has already commented, this will now shutdown litd before the shutdownInterceptor.ShutdownChannel() has been triggered, which we would like to avoid.

Comment on lines +427 to +430
if err := g.shutdownSubServers(); err != nil {
log.Errorf("Error shutting down after failed start: %v",
err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only do this after the shutdownInterceptor.ShutdownChannel() has been triggered, similar to the current logic below :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🆕 New

Development

Successfully merging this pull request may close these issues.

Fail litd startup on tapd startup error when taproot-assets-mode=enable

5 participants